create the spider with this url
https://books.toscrape.com/
Make sure you are in the spiders
directory
cd bookscraper/bookscraper/spiders
Make a spider, in the spiders directory with the command,
scrapy genspider bookspider https://books.toscrape.com/
provide name for spider alone with baseUrl
This is the page the spider will base it self in https://books.toscrape.com/
The allowed_domains
[] list will allow us to crawl, while limiting the amount of urls to connect to.
This restricts the crawling to the allowed_domains websites only.
The start_urls
is obviously the starting urls.
The response
in the parse
method will be returned from the
fetch
command seen in next lesson